Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems
نویسنده
چکیده
Providing the ability to increase or decrease allocated resources on demand as the transactional load varies is essential for database management systems (DBMS) deployed on today’s computing platforms, such as the cloud. The need to maintain consistency of the database, at very large scales, while providing high performance and reliability makes elasticity particularly challenging. In this thesis, we exploit data partitioning as a way to provide elastic DBMS scalability. We assert that the flexibility provided by a partitioned, shared-nothing parallel DBMS can be used to implement elasticity. Our idea is to start with a small number of servers that manage all the partitions, and to elastically scale out by dynamically adding new servers and redistributing database partitions among these servers as the load varies. Implementing this approach requires (a) efficient mechanisms for addition/removal of servers and migration of partitions, and (b) policies to efficiently determine the optimal placement of partitions on the given servers as well as plans for partition migration. This thesis presents Elasca, a system that implements both these features in an existing shared-nothing DBMS (namely VoltDB) to provide automatic elastic scalability. Elasca consists of a mechanism for enabling elastic scalability, and a workload-aware optimizer for determining optimal partition placement and migration plans. Our optimizer minimizes computing resources required and balances load effectively without compromising system performance, even in the presence of variations in intensity and skew of the load. The results of our experiments show that Elasca is able to achieve performance close to a fully provisioned system while saving 35% resources on average. Furthermore, Elasca’s workload-aware optimizer performs up to 79% less data movement than a greedy approach to resource minimization, and also balance load much more effectively.
منابع مشابه
Dynamic Workload-Aware Elastic Scale-Out in Cloud Data Stores
NoSQL databases store a huge amount of data generated by modern web applications. To improve scalability, a database is partitioned and distributed among the different nodes called as a scale out. However, this scale out feature of the NoSQL database is oblivious to the data access pattern of the web applications, which results in poorly distributed data across all the nodes. Therefore, the cos...
متن کاملLoom: Query-aware Partitioning of Online Graphs
As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to interpartition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition bounda...
متن کاملHyper-Graph Based Database Partitioning for Transactional Workloads
A common approach to scaling transactional databases in practice is horizontal partitioning, which increases system scalability, high availability and self-manageability. Usually it is very challenging to choose or design an optimal partitioning scheme for a given workload and database. In this technical report, we propose a fine-grained hyper-graph based database partitioning system for transa...
متن کاملScaling transactional workloads on the cloud
In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support database-as-a-service in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availabi...
متن کاملSchism: a Workload-Driven Approach to Database Replication and Partitioning
We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of sharednothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partition...
متن کامل